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Abstract: The air quality in China, particularly the PM2.5 (particles less than 2.5 |im in 
aerodynamic diameter) level, has become an increasing public concern because of its 
relation to health risks. The distribution of PM2.5 concentrations has a close relationship 
with multiple geographic and socioeconomic factors, but the lack of reliable data has been 
the main obstacle to studying this topic. Based on the newly published Annual Average 
PM2.5 gridded data, together with land use data, gridded population data and Gross 
Domestic Product (GDP) data, this paper explored the spatial-temporal characteristics of 
PM2.5 concentrations and the factors impacting those concentrations in China for the years 
of 2001-2010. The contributions of urban areas, high population and economic 
development to PM2.5 concentrations were analyzed using the Geographically Weighted 
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Regression (GWR) model. The results indicated that the spatial pattern of PM2.5 
concentrations in China remained stable during the period 2001-2010; high concentrations 
of PM2.5 are mostly found in regions with high populations and rapid urban expansion, 
including the Beijing-Tianjin-Hebei region in North China, East China (including the 
Shandong, Anhui and Jiangsu provinces) and Henan province. Increasing populations, 
local economic growth and urban expansion are the three main driving forces impacting 
PM2.5 concentrations. 

Keywords: PM2.5; GDP; population; land use change; geographically weighted regression 



1. Introduction 

Fine particulate matter (PM2.5, i.e., particles less than 2.5 |im in aerodynamic diameter) is rich in 
organic toxic components and has a strong association with many adverse health effects [1-3]. 
Epidemiological studies have reported associations between PM2.5 and a variety of medical diseases 
such as asthma, cardiovascular problems, respiratory infections, lung cancer [4] and breast cancer. 
The air quality in China, particularly the PM2.5 level, has become an increasing public concern because 
of its connection to such health risks. For example, Hu suggested that exposure to high PM levels may 
have deleterious effects on the duration of survival after a breast cancer diagnosis among females [5]. 
Zhang et al. estimated the number of people living in high-exposure areas in Beijing during the autumn 
of 2012 [6]. 

Accurate modeling of fine scale spatial variation in PM2.5 concentrations is critical for environmental 
and epidemiological studies. Land use regression (LUR) models are widely employed to expand in situ 
measurements of PM2.5 concentrations to large areas. LUR is essentially an interpolation technique that 
employs the PM2.5 concentrations as the dependent variable, with proximate land use, traffic and 
physical environmental variables used as independent predictors [7,8]. Some literature has suggested 
that PM emissions in urban areas has come from road traffic, household activities, energy production, 
building work, (inland) shipping and (small-scale) industry [9]. However, the lack of long-term 
monitoring data is the primary obstacle in developing countries where PM2.5 concentration sites are 
sparse and have only been established in recent years. Fortunately, recent studies have indicated that 
satellite-observed total-column aerosol optical depth (AOD) could offer spatially continuous 
information about PM2.5 concentrations at the global scale [10]. Mao et al. developed an 
AOD-enhanced space-time LUR model to predict the PM2.5 concentrations in the state of Florida in the 
USA [11]. Van et al. presented a global estimate of PM2.5 concentrations at 10 km x 10 km resolution 
for six years (2001-2006) by combing satellite-derived AOD with in situ measurements [12]. In June 
2013, the Global Annual PM2.5 Grids were published by Battelle Memorial Institute and the Center for 
International Earth Science Information Network (CIESIN)/Columbia University; this data set 
represents a series of annual average grids (2001-2010) [13]. The global, 0.5 x 0.5 grid of estimated 
PM2.5 concentrations was developed using monthly AOD data from MODIS and MISR for the period 
2001-2010; these estimates leveraged the AOD/PM2.5 surface level conversion factors calculated by 
van Donkelaar [12] and applied them to the gridded remote sensing data. The gridded product provides 
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a continuous surface of PM2.5 concentrations in micrograms per cubic meter for health and 
environmental research. 

The objective of this paper is to explore the spatio-temporal characteristics and driving forces of 
PM2.5 concentrations in China based on long-term newly refined data. Annual Average PM2.5 gridded 
data, land use data, gridded population data and Gross Domestic Product (GDP) data for the period 
2001-2010 were used in the analysis. The contributions of urban areas, high population and economic 
development to PM2.5 concentrations were analyzed using the Geographically Weighted Regression 
(GWR) model. 

2. Data Acquisition 

2.1. PM2.5 Data 

The Global Annual Average PM2.5 Grids represent a series of annual average grids (2001-2010) 
of PM2.5, these data were obtained from the Battelle Memorial Institute and the Center for International 
Earth Science Information Network (CIESIN)/Columbia University, and each file obtained from 
Battelle/CIESIN contains integer values for a global, 0.5 x 0.5 grid of estimated PM2.5 concentrations. 
The average annual PM2.5 concentration for each grid cell was calculated by multiplying the MODIS 
and MISR mean AOD for each month by the monthly conversion factor, as in Equation (1) [13]: 

y'" AOD xri- 
E.= (1) 

where Ei stands for an annual- average estimated PM2.5 concentration for each grid cell; 
AODim stands for the MODIS and MISR mean AOD for each month; and 77. ^ stands for the monthly 

conversion factor. 

The PM2.5 dataset offers spatially continuous information about PM2.5 concentrations at the global 
scale while it also has some uncertainties. PM2.5 concentrations derived from Battelle/CIESIN has 
biased values, and they may be higher or lower than those for van Donkelaar et al. in some regions 
such as largely arid and semi-arid countries with large desert areas [13]. The reasons for these 
differences are unclear. Uncertainties or limitations of the AOD data and computing methods or some 
other possible reasons all can cause these biases. 

Data for China were extracted from the global dataset using the ArcGIS software and were 
transformed to the same coordinate system as the other datasets, specifically the Albers Equal Area 
projection system, Beijing 1954 geodetic datum and Krassovsky ellipsoid. Figure 1 shows the 
estimated distribution of PM2.5 concentrations in China from 2001 to 2010. 

2.2. Population Data 

Gridded population data in China with a spatial resolution of 1km from 2001 to 2010 were provided 
by the Resources and Environmental Scientific Data Center (RESDC), Chinese Academy of Sciences 
(CAS) [14]. These gridded population data were transformed from census data based on the 
relationship between demographical data and land use types, and the population data were 
redistributed onto 1 km x 1 km grids [14]. 
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Figure 1. The estimated distribution of PM2.5 concentrations in China from 2001 to 2010. 
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Figure 1. Cont, 




2.3, GDP Data 

GDP is a commonly used indicator of economic development. The gridded GDP dataset for China 
from 2001 to 2010 was adopted in this study. The dataset was obtained from RESDC, CAS. 
The statistical GDP data at the county level were transformed into gridded data at a resolution of 
1 km X 1 km based on the relationship between the GDP data and the land use types [15]. 

2.4, Land-use Data 

Land-use data for China at a scale of 1:100,000 for the years 2001 and 2010 were used in this study. 
The datasets were obtained using Landsat TM (Thematic Mapper) and the China-Brazil Earth 
Resources satellite (CBERS-2) satellite images and were interpreted by experts at RESDC, 
Chinese Academy of Sciences. The following six land use types were identified: (1) cultivated land; 
(2) woodland; (3) grass land; (4) water; (5) urban and rural settlements; and (6) barren land. Areas of 
urban sprawl were derived from the land use data. A set of land data from field surveys was selected to 
guarantee the accuracy of land use classification and it is the most accurate land use dataset at this 
scale in China [16]. Before further processing, all of the source data were re-sampled onto a raster 
dataset with 1 km spatial resolution, and transformed into the same coordinate system. 
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3. Methodology 

The spatio-temporal variations of the PM2.5 concentrations and their relationships with socioeconomic 
factors were evaluated using the following steps: 

Step 1: Evaluate the spatio-temporal variation of PM2.5 concentrations in China from 2001 to 2010 

based on annual average PM2.5 grids. 
Step 2: Compare the distribution of PM2.5Concentrations with each of the following factors: urban areas, 

population and GDP. The impact of each factor on the PM2.5 concentrations was analyzed 

and compared. 

Step 3: Use the GWR method to evaluate the relationships between the PM2.5 concentrations and 
the urban areas, population and GDP. 

A conventional regression method, such as ordinary least squares (OLS), is a type of global statistic 
that assumes that the relationship under study is constant over space and therefore assumes that the 
parameter is the same for the entire study area [17]. The GWR model extends the traditional standard 
regression framework to estimate local, rather than global, parameters [18]. The GWR model is a type 
of local statistic that produces a set of local parameter estimates that show how a relationship varies 
over space. This visualization enables examining the spatial pattern of the local statistics to gain a 
better understanding of possible hidden causes for that pattern [19]. The global regression model can 
be expressed as follows: 

Yi= ao + ^ a\.X'± + (2) 

k 

We can obtain the vector estimates of the parameters using OLS: 

a = {X^X)-'X^Y (3) 

Where a is the vector estimate of the parameters, and X is the matrix composed of the observed values 
of the independent variable; the first column elements of X are 1. Y is composed of the observed 
values of the dependent variable [20]. 

The GWR method considered the local estimates of the parameters, and the model is extended to (1): 

Yi = ao(Ui, Vi) + ^ ak(Ui, Vi)zik + Si 

Where ([/i,Vi) is the space coordinate of sample point / Sind ak(Ui,Vi) is the value of the continuous 
function ak(U,V) on point /. If a)^(C/,y) remains unchanged in space, the model (3) is translated into 

the global regression model. Therefore, the GWR equation considers the spatial variability of the 
relationship: 

A(Ui,Vi) = (X^W(Ui,Vi)Xy'X^W(Ui,Vi)Y (5) 
where W(Ui,Vi) is the range weight matrix [20,21]. 
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4. Results and Analysis 

4,1, Spatio-temporal Variation ofPM2.5 Concentrations in China 

The World Health Organization (WHO) defined the standard for the annual average PM2.5 
concentration to be less than 10 |ig/m^ [22]. The human illness rate will increase immensely when the 
annual average concentrations reach 35 |Lig/m^ (Target 1 level). However, China's air quality standard 
for the annual average PM2.5 concentration is 35 |ig/m^ [23]. Temporal profiles of the areas of each 
PM2.5 concentration level from 2001 to 2010 are shown in Figure 2. 

Figure 2. Temporal profiles of the areas in China for which PM2.5 concentrations are 
below 10 |Lig/m or above 35 |ig/m from 2001 to 2010. 
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Figure 3 shows the regions whose concentrations have exceeded 35 |ig/m^ in China for the years 
2001 and 2010. Figures 4-6 show the distribution of population, GDP and urban area in China for the 
years 2001 and 2010. From these maps, we can see the spatial variation of PM2.5, population, GDP and 
urban areas in China for 2001 and 2010 and the area of the IT-1 level defined by WHO (annual mean 
PM2.5 concentration in excess of 35|Lig/m^) slowly increased by 7.2%/a on average from 2001 to 2007 
and decreased by 7.5%/a on average from 2007 to 2010. 

We hypothesize that higher populations and GDP levels may cause higher PM2.5 concentrations and 
that a larger urban area results in higher PM2.5 concentrations. Thus, we build the regression model to 
study the correlation between the PM2.5 concentrations and the population, GDP and urban area in 
China for 2001 and 2010 to evaluate the determinants of the increase in PM2.5 concentrations. 
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Figure 3.WHO target levels for PM2.5 concentrations in regions of China for (a) 2001 
and (b) 2010. 




(a) 2001 (b) 2010 

Figure 4. Population distribution in China for (a) 2001 and (b) 2010. 




(a) 2001 (b) 2010 



Figure 5. GDP distribution in China for (a) 2001 and (b) 2010. 
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(a) 2001 (b) 2010 



4,2, Correlation between PM2.5 Concentrations and Socioeconomic Issues 

Before geographically weighted regression can occur, one initial statistical analysis can determine 
the characteristics of each of the variables proposed for the model. We use liner regression models to 
examine the correlation between PM2.5 and each variable. The summary statistical results are shown in 
Table 1. All of the associations with PM2.5 are in the expected direction. 



Table 1. The summarized statistical results of the initial statistical analysis. 



Variable 




2001 




2010 


R* 


Correlation 


R* 


Correlation 


Population 


0.41 


positive 


0.50 


positive 


PM2.5 GDP 


0.59 


positive 


0.58 


positive 


Urban area 


0.59 


positive 


0.59 


positive 



Notes: * R is the correlation coefficient between PM2.5 and population, 
GDP and urban area. All the results have statistical significance. 
The P value of each regression is less than 0.05. 



We adopted the Variance Inflation Factor (VIF) to detect whether there existed the co-linearity 
problem among the indicators. With SPSS software, we found that the VIF values of population, 
GDP and urban area are 2.79, 2.58, 2.12 of the year 2001, separately; and 3.63, 3.60, 2.31 of the year 2010, 
separately. So we should not worry about the co-linearity problem since the VIF values range from 
0 to 10, even though there may be some correlations among the three factors. 

The GWR model is run with ArcGIS 9.3 software. We built a GWR model of the correlation 
between PM2.5 concentrations and population, GDP and urban area in China for 2001 and 2010. 
The GWR model produces a set of local regression results including local parameter estimates and 
local residuals, which can be mapped to show their spatial variability. In this study, we have chosen an 
ADAPTIVE kernel whose bandwidth will be found by minimizing the corrected Akaike Information 
Criterion (AICc) value. 
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From the report created by the GWR tool, we can obtain the local and the local Adjusted 
(the adjusted R^). The local Rvalues for 2001 and 2010 are 0.820 and 0.822, respectively. The local R^ 
adjusted values for 2001 and 2010 are 0.810 and 0.815, respectively. The values for the two years are 
very closewhich can denote that the overall performance of the model relatively high in both of the two 
models. The values of the standardized residual (StdResid) for 2001 and 2010 can be mapped; 
these are shown in Figure 7. Not surprisingly, some unusually high or low residuals can be observed. 
Those regions with some desert area have very large residuals (StdResid > 2). For example, the PM2.5 
concentrations in the northwest region of Xinjiang are high because of the desert. Xinjiang has a high 
incidence zone of dust explosion. The concentrations of dust aerosol at altitude are closely related to 
the surface conditions below; the concentrations of particles above the desert areas will be greater than 
those in the vegetation-covered areas [24]. Therefore, the high PM2.5 concentrations in desert regions are 
mainly related to the dusty weather. The southern Hebei province, the northern Henan province and the 
northwest Shandong province also have much higher residuals because they are high pollution emission 
regions of northern China. For example, the Shijiazhuang Iron and Steel Company discharges an average 
of more than 2000 t/a of PM2.5. There are also many polluting enterprises in the urban areas [25]. 
The pollution in these regions is always more serious than the pollution in other regions. In addition, 
the Sichuan basin has high residuals because of its high aerosol optical depth values. The optical depth of 
the Sichuan Basin is higher than its surrounding areas due to its geographical climate characteristics; 
its annual average optical depth is approximately 0.7 [26]. PM2.5has a strong positive correlation with 
AOD [27], so the Sichuan Basin has high PM2.5 concentrations. Regions that are rich in marine salt can 
also have high PM2.5 concentrations [28]. Those regions have a noticeable over-prediction of PM2.5 
concentrations; this warrants closer inspection to discover the possible explanations. In those regions, 
the model under-predicts the levels of PM2.5 concentrations [18]. However, the regions with StdResid 
values in the range of -2 to 2 account for 94.8% and 94.6% of the whole country, which indicates that 
the relations between PM2.5 and that each of the three factors are stable. What's more, we can also 
obtain from the results that the regions with the positive value of the local coefficients for urban areas, 
population, and GDP account for 92.72%, 90.52% and 95.62% respectively in 2001 and 92.01%, 
95.29% and 90.50% respectively in 2010 of the whole country. There is agreement with our expectation 
on the direction of the influence of those variables. 

Figure 7. Maps of standardized residuals from the GWR model in China for the years 
(a) 2001 and (b) 2010. 




(a) 2001 



(b) 2010 
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5. Discussion 

Most of the associations between the PM2.5 concentrations and the other variables considered are in the 
expected direction. PM2.5 is correlated to population, GDP and urban area. Therefore, we have sufficient 
reason to believe that the regions with large populations, high values of GDP and large urban areas 
would have high values of PM2.5. 

In China, PM2.5 mainly comes from human activities (motor vehicle tail gas dust and coal dust) and 
the karaburan; the ground dust and secondary particles of the karaburan also contribute to the PM2.5 
concentrations [29,30]. Human activities that have a strong impact on the air quality are often sources 
of PM2.5; therefore, some cities with poor air quality have a high level of PM2.5. Metropolitan areas 
have large populations, high GDP values, large proportions of urbanization and industries that produce 
contaminants [31]. From Figure 2, we can see that the high PM2.5 values are mainly concentrated in the 
regions with large populations, high GDP values and large proportions of urbanization. For example, 
in Beijing, the values of PM2.5 have a linear relation with motor vehicle tail gas dust, coal dust and 
karaburan [32-34]. Population growth and economic development are accelerating the environmental 
deterioration in Beijing [31]. In addition, some research shows that, in Tianjin and Chongqing, 
the PM2.5 concentrations are dependent on motor vehicle exhaust dust, coal dust and karaburan [35-37]. 
In some large cities, the pollution from coal, other fuels and industrial pollution impacts particulate 
matter concentrations [38]. In addition, PM2.5 concentrations are dependent on temperature, humidity and 
rainfall in some regions [39]. Each area has its own leading factor that influences PM2.5 concentrations. 
Future work might include partitioning the different factors impacting PM2.5 concentrations. 

6. Conclusions 

In this study, spatio-temporal characteristics and factors impacting PM2.5 concentrations in China 
for the years 2001-2010 were evaluated based on newly refined long-term data. The following main 
conclusions are reached: 

(1) In general, the spatial pattern of PM2.5 concentrations in China has remained stable during the 
period 2001-2010. The area of the IT-1 level defined by the WHO (annual mean PM2.5 
concentration in excess of 35 |Lig/m ) slowly increased by 7.2%/a on average from 2001 to 2007 and 
decreased by 7.5%/a on average from 2007 to 2010. 

(2) PM2.5 is mostly concentrated in regions with high populations, GDP and large urban regions, 
including the Beijing-Tianjin-Hebei region in north China, east China (including the Shandong, 
Anhui and Jiangsu provinces), the Henan province. The Sichuan basin is one exception to this result. 

This paper, for the first time, presents a comprehensive insight into the spatio-temporal 
characteristics of PM2.5 concentrations in China at national scale. However, the problem is complex 
and needs further attention. 
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